Search CORE

arXiv.org e-Print Archive

ART

On Budget-Feasible Mechanism Design for Symmetric Submodular Objectives

Author: A Borodin
A Gupta
A Krause
A Kulik
AA Ageev
AA Schäffer
C Chekuri
G Amanatidis
GL Nemhauser
J Kleinberg
J Lee
LA Wolsey
M Queyranne
M Sviridenko
NA Cressie
R Myerson
S Fujishige
T Horel
U Feige
WF Caselton
Publication venue
Publication date: 09/10/2017
Field of study

We study a class of procurement auctions with a budget constraint, where an auctioneer is interested in buying resources or services from a set of agents. Ideally, the auctioneer would like to select a subset of the resources so as to maximize his valuation function, without exceeding a given budget. As the resources are owned by strategic agents however, our overall goal is to design mechanisms that are truthful, budget-feasible, and obtain a good approximation to the optimal value. Budget-feasibility creates additional challenges, making several approaches inapplicable in this setting. Previous results on budget-feasible mechanisms have considered mostly monotone valuation functions. In this work, we mainly focus on symmetric submodular valuations, a prominent class of non-monotone submodular functions that includes cut functions. We begin first with a purely algorithmic result, obtaining a

\frac{2e}{e-1}

-approximation for maximizing symmetric submodular functions under a budget constraint. We view this as a standalone result of independent interest, as it is the best known factor achieved by a deterministic algorithm. We then proceed to propose truthful, budget feasible mechanisms (both deterministic and randomized), paying particular attention on the Budgeted Max Cut problem. Our results significantly improve the known approximation ratios for these objectives, while establishing polynomial running time for cases where only exponential mechanisms were known. At the heart of our approach lies an appropriate combination of local search algorithms with results for monotone submodular valuations, applied to the derived local optima.Comment: A conference version appears in WINE 201

arXiv.org e-Print Archive

Local Guarantees in Graph Cuts and Clustering

Author: A Ben-Dor
A Wirth
AA Schäffer
D Monderer
DS Johnson
ED Demaine
G Christodoulou
HP Kriegel
N Ailon
N Ailon
N Bansal
N Bansal
P Symeonidis
V Filkov
Z Svitkina
Publication venue
Publication date: 02/04/2017
Field of study

Correlation Clustering is an elegant model that captures fundamental graph cut problems such as Min

s-t

Cut, Multiway Cut, and Multicut, extensively studied in combinatorial optimization. Here, we are given a graph with edges labeled

+

-

and the goal is to produce a clustering that agrees with the labels as much as possible:

+

edges within clusters and

-

edges across clusters. The classical approach towards Correlation Clustering (and other graph cut problems) is to optimize a global objective. We depart from this and study local objectives: minimizing the maximum number of disagreements for edges incident on a single node, and the analogous max min agreements objective. This naturally gives rise to a family of basic min-max graph cut problems. A prototypical representative is Min Max

s-t

Cut: find an

s-t

cut minimizing the largest number of cut edges incident on any node. We present the following results:

(1)

O(\sqrt{n})

-approximation for the problem of minimizing the maximum total weight of disagreement edges incident on any node (thus providing the first known approximation for the above family of min-max graph cut problems),

(2)

a remarkably simple

7

-approximation for minimizing local disagreements in complete graphs (improving upon the previous best known approximation of

48

), and

(3)

1/(2+\varepsilon)

-approximation for maximizing the minimum total weight of agreement edges incident on any node, hence improving upon the

1/(4+\varepsilon)

-approximation that follows from the study of approximate pure Nash equilibria in cut and party affiliation games

Ensemble approach to predict specificity determinants: benchmarking and validation

Author: A Carro
A del Sol
AA Schäffer
Anna R Panchenko
B Reva
DP Brown
E Marchiori
HM Berman
I Kononenko
IM Wallace
J Pei
JA Capra
JE Donald
K Mizuguchi
K Ye
L Mirny
N Krishnamurthy
O Lichtarge
OV Kalinina
OV Kalinina
P Marttinen
RF Doolittle
RM Ward
S Chakrabarti
S Chakrabarti
S Ohno
Saikat Chakrabarti
SS Hannenhalli
W Pirovano
WL DeLano
X Gu
X Gu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background It is extremely important and challenging to identify the sites that are responsible for functional specification or diversification in protein families. In this study, a rigorous comparative benchmarking protocol was employed to provide a reliable evaluation of methods which predict the specificity determining sites. Subsequently, three best performing methods were applied to identify new potential specificity determining sites through ensemble approach and common agreement of their prediction results. Results It was shown that the analysis of structural characteristics of predicted specificity determining sites might provide the means to validate their prediction accuracy. For example, we found that for smaller distances it holds true that the more reliable the prediction method is, the closer predicted specificity determining sites are to each other and to the ligand. Conclusion We observed certain similarities of structural features between predicted and actual subsites which might point to their functional relevance. We speculate that majority of the identified potential specificity determining sites might be indirectly involved in specific interactions and could be ideal target for mutagenesis experiments.</p

Springer - Publisher Connector

Histopathologic aspects in Plagioscion squamosissimus (HECKEL, 1940) induced by Neoechinorhynchus veropesoi, metacestodes and anisakidae juveniles

Author: Amato JFR
Amin OM
Barthem RB
Casatti L
Dezfuli BS
Eiras JC
Feist SW
Ferraz de Lima CLB
Huizinga HW
Malta JCO
Margolis L
Martins ML
Melo FTV
Miyazaki T
Moravec F
Rego AA
Saraiva AMPM
Schäffer GV
Silva JP
Sugawara Y
Thatcher VE
Thatcher VE
Urawa S
Urawa S
Woo PTK
Woo PTK
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

Author: AA Schäffer
AL Delcher
Alejandro A Schäffer
B Brejová
B Hao
BG Barrell
DJ States
E Birney
E Birney
E Boy-Marcotte
E Boy-Marcotte
E Halperin
E Michael Gertz
EM Gertz
F Damak
F Zinoni
G Macino
H Peltola
IG Young
J Hein
J Hein
JC Wootton
L Knecht
M Gribskov
MS Boguski
MS Boguski
MS Gelfand
O Gotoh
P Steneberg
P Steneberg
R Durbin
Richa Agarwala
S Henikoff
S Kurtz
SA Chervitz
SC Low
SF Altschul
SF Altschul
SF Altschul
SF Altschul
Stephen F Altschul
TF Smith
W Gish
WJ Kent
WR Pearson
WR Pearson
WR Pearson
X Guan
X Huang
Yi-Kuo Yu
YK Yu
YK Yu
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

Springer - Publisher Connector

Digital Repository @ Iowa State University (ISU)

Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty

Author: A Agrawal
A Agrawal
AA Schäffer
AK Hartmann
Ankit Agrawal
AY Mitrophanov
CA Orengo
J Rocha
M Kschischo
M Pagni
ML Sierk
MS Waterman
P Bucher
PH Sellers
R Mott
R Mott
R Mott
R Olsen
RF Mott
S Grossmann
S Karlin
S Kotz
S Sheetlin
S Wolfsheimer
SE Brenner
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
TF Smith
WR Pearson
WR Pearson
WR Pearson
WR Pearson
WR Pearson
X Huang
X Huang
Xiaoqiu Huang
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. Conclusion: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search

Springer - Publisher Connector

Public Library of Science (PLOS)

Accelerated Profile HMM Searches

Author: A Jacob
A Krogh
A Milosavljević
A Wozniak
AA Schäffer
B Rekapalli
C Camacho
DR Horn
EK Freyhult
EM Gertz
G Chukkapalli
GA Price
J Landman
JP Walters
JP Walters
K Karplus
LR Rabiner
LS Johnson
M Farrar
M Madera
R Durbin
RD Finn
RP Maddimsetty
S Derrien
S Hunter
S Johnson
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SJ Melnikoff
SR Eddy
T Oliver
T Rognes
T Rognes
TF Smith
V Chaudhary
V Sachdeva
William R. Pearson
WN Grundy
WR Pearson
Y Sun
Y Sun
YK Yu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

CiteSeerX

High diversity of picornaviruses in rats from different continents revealed by deep sequencing

Author: Aljofan M
Altschul SF
Altschul SF
Benson DA
Bernhart SH
Boisvert S
Chopra G
Coghlan ML
de Groot RJ
Drexler JF
Drexler JF
Easterbrook JD
Edgar RC
Edgar RC
Firth C
Friis-Nielsen J
Gatherer D
Geng H
Griffiths-Jones S
Günther S
Hahn H
Himsworth CG
Holtz LR
Honkavuori KS
Hugh-Jones ME
Hunter AA
Huson DH
Jirintai S
Jones MS
Koressaar T
Ksiazek TG
Kurtz S
Langmead B
Li H
Lindgreen S
Maurice H
Meerburg BG
Mirarab S
Mirarab S
Murray DC
Ng TFF
Nielsen ACY
Oleszak EL
Palacios G
Phan TG
Phan TG
Pickett BE
Punta M
Rice P
Sachsenröder J
Schein MW
Schäffer AA
Spyrou V
Stamatakis A
Taberlet P
Tapparel C
Taylor PG
Truong QL
Victoria JG
Will S
Wolf S
Yu J-M
Zeale MRK
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission

Copenhagen University Research Information System